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ABSTRACT 

Spatially-varying depth and characteristics of observing conditions, such as seeing, airmass, 
or sky background, are major sources of systematic uncertainties in modern galaxy survey 
analyses, in particular in deep multi-epoch surveys. We present a framework to extract and 
project these sources of systematics onto the sky, and apply it to the Dark Energy Survey 
(DES) to map the observing conditions of the Science Verification (SV) data. The resulting 
distributions and maps of sources of systematics are used in several analyses of DES SV to 
perform detailed null tests with the data, and also to incorporate systematics in survey simu¬ 
lations. We illustrate the complementarity of these two approaches by comparing the SV data 
with the BCC-UEig, a synthetic sky catalogue generated by forward-modelling of the DES 
SV images. We analyse the BCC-UEig simulation to construct galaxy samples mimicking 
those used in SV galaxy clustering studies. We show that the spatially-varying survey depth 
imprinted in the observed galaxy densities and the redshift distributions of the SV data are 
successfully reproduced by the simulation and well-captured by the maps of observing con¬ 
ditions. The combined use of the maps, the SV data and the BCC-UEig simulation allows us 
to quantify the impact of spatial systematics on N{z), the redshift distributions inferred using 
photometric redshifts. We conclude that spatial systematics in the SV data are mainly due 
to seeing fluctuations and are under control in current clustering and weak lensing analyses. 
However, they will need to be carefully characterised in upcoming phases of DES in order to 
avoid biasing the inferred cosmological results. The framework presented here is relevant to 
all multi-epoch surveys, and will be essential for exploiting future surveys such as the Large 
Synoptic Survey Telescope, which will require detailed null-tests and realistic end-to-end im¬ 
age simulations to correctly interpret the deep, high-cadence observations of the sky. 

Key words: precision cosmology, galaxy surveys, spatial systematics, image simulations 
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1 INTRODUCTION 

The Dark Energy Survey (DES, The Dark Energy Survey Collab¬ 
oration 2005) began in 2012 and will observe during at least five 
seasons to cover ~ 5000 square degrees of the Southern sky, in five 
optical bands (grizY). When completed, DES will cover a volume 
of the Universe up to 20 times greater than the Sloan Digital Sky 
Survey (SDSS, Gunn et al. 2006), the largest optical survey to date. 
Hence, DES will provide an enormous legacy data set useful in a 
range of astrophysical and cosmological studies. It is thus essential 
to develop approaches to robustly analyse DES data while account¬ 
ing for statistical and systematic uncertainties. 

The primary science goal of DES is to uncover the nature 
of dark energy using a combination of cosmological observables. 
In addition to expansion rate measurements using supernova light 
curves, DES will rely on probes of the growth rate such as the clus¬ 
tering and gravitational lensing of galaxies and clusters of galax¬ 
ies. Exploiting these observables to probe dark energy requires 
exquisite control over the spatial coverage and calibration of the 
survey. Spatial fluctuations in the depth or quality of the data 
(e.g., the properties of the sky noise, the photometry, or galaxy el- 
lipticity measurements) can impact the galaxy catalogues and lead 
to systematic biases in cosmological analyses. All ongoing and fu¬ 
ture surveys will be limited by our ability to identify and mitigate 
such systematics. 

Establishing an exhaustive list of the sources of potential sys- 
tematics in cosmological measurements is beyond the scope of this 
paper. However, it is worth recalling that systematics in cluster¬ 
ing and cosmic shear studies are mostly rooted in astrophysical 
foregrounds (extinction by dust or obscuration by bright stars), ob¬ 
serving conditions (e.g., seeing, sky noise, airmass), or processing 
and calibration (such as the quality of the photometry or the point 
spread function). These affect the probability of detecting sources 
and also their properties, yielding non-trivial distortions in the re¬ 
duced data, in particular the galaxy catalogues. In DES, various 
efforts are dedicated to modelling or capturing the complicated 
transfer function connecting the raw data to the final galaxy cat¬ 
alogues. Eor instance, the Ultra East Image simulator (UEig, Berge 
et al. 2013) is used to create simulated DES images, which are then 
processed in a similar manner to the real data. This approach has 
been investigated e.g., to characterise systematics in shear mea¬ 
surements (Bruderer et al. 2015). UEig was also interfaced with 
the BCC N-body simulations (Busha et al. 2013) by Chang et al. 
(2014) in order to forward-model the survey transfer function with 
known underlying astrophysics and cosmology. In this paper, we 
test this transfer function and investigate how well the BCC-UEig 
is able to reproduce physical characteristics (e.g., redshift distribu¬ 
tions) and systematics (e.g., spurious galaxy density fluctuations) 
found in the DES Science Verification (SV) data. By contrast, Bal- 
ROG^ (Suchyta et al. 2015, used in Melchior et al. 2015) takes the 
approach of populating real DES images with simulated galaxies 
in order to measure the effective transfer function of the survey. 
These complementary efforts will be improved in the coming years 
to fully exploit DES data. 

The observing conditions and astrophysical foregrounds un¬ 
avoidably vary across the survey footprint (e.g., nightly variations 
of seeing, or colour reddening by Galactic dust). This paper fo¬ 
cuses on mapping these sources of systematics onto the sky. This 
operation is analogous to the construction of foreground templates 
for the analysis of cosmic microwave background (CMB) data 

^ https://github.coni/enihuff/Balrog 


(e.g., Tegmark 1997; Slosar, Seljak & Makarov 2004; Ade et al. 
2014). Such templates are used in numerous analyses of single¬ 
epoch surveys like SDSS, in particular galaxy and quasar cluster¬ 
ing measurements (e.g., Tegmark et al. 1998; Scranton et al. 2002; 
Ross et al. 2012; Ho et al. 2012; Leistedt et al. 2013; Leistedt & 
Peiris 2014; Agarwal et al. 2014), and are being used in analyses 
of SV data (e.g., Vikram et al. 2015; Crocce et al. 2015; Jarvis 
et al. 2015; Becker et al. 2015; Giannantonio et al. 2015). More 
generally, templates of potential sources of systematics can be used 
to carry out null tests with the data or model their contamination. 
As detailed below, multi-epoch surveys such as DES require a ded¬ 
icated projection framework. In addition, the extracted observing 
conditions can be incorporated in image simulations to mimic the 
survey properties. 

This paper is organised as follows. In Section 2 we present a 
scheme to map multi-epoch survey data onto the sky, and apply it to 
DES SV data. We present and analyse the resulting maps of sources 
of observational systematics. In Section 3 we use these maps to 
analyse the SV data and the BCC-UEig simulations, and show the 
impact of observational systematics on the measured galaxy den¬ 
sities and on the redshift distributions inferred using photometric 
redshifts. In Section 4 we conclude and discuss the impact and fu¬ 
ture extensions of this work. 


2 MAPPING THE PROPERTIES OF DES-SV IMAGES 
2.1 Geometrical projection 

Mapping potential sources of systematics, such as observing con¬ 
ditions, is a routine operation in modern galaxy surveys. Eor the 
SDSS, this mapping was relatively straightforward since SDSS was 
a single-epoch survey. Therefore, a direct mapping between sky 
position and images could be established^ (e.g., Ross et al. 2011, 
2012; Leistedt & Peiris 2014). In other words, any of the prop¬ 
erties of SDSS images (e.g., seeing) directly project onto the sky. 
This is no longer the case for DES, which is a multi-epoch survey 
where several single-epoch CCD images are processed and stacked 
into ‘coadd’ images, from which galaxies and stars are then ex¬ 
tracted. The nominal depth in the main DES survey requires up 
to ten tilings in each band, while deeper regions require an order 
of magnitude more (i.e., in the SN fields, which are dedicated to 
the DES supernova programme). The coadding process is done in 
non-overlapping regions called ‘tiles’, which are 0.75 x 0.75 deg^ 
squares constructed to cover and uniquely decompose the entire 
DES footprint. As a consequence of the multi-epoch nature of DES, 
there is not a unique value of e.g., seeing at each sky position, 
but rather a distribution of values corresponding to the coadded 
single-epoch images. This is illustrated in Eigure 1, which shows 
the footprints and properties of a set of single-epoch images used 
in an arbitrary coadd, part of the DES-SV data (described in the 
next section). The seeing, airmass and background noise (as well as 
many other properties not shown here) exhibit strong fiuctuations 
and correlations. Combined with the non-trivial geometrical over¬ 
lap between these images, this demonstrates the need for a fiexible 
projection framework. In the standard processing pipeline, these 
images are processed and coadded in tiles (black line of Eigure 1) 
with the DESDM software, using the software packages described 

^ With the exception of the Stripe 82 region, the deeper multi-epoch pro¬ 
gramme of SDSS, and the small zones of overlap between the single-epoch 
images. 
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Figure 1. Upper panel: geometrical projection of the single-exposure im¬ 
ages coadded in an arbitrary tile of the DES Science Verification data (black 
contour). The colours correspond to different single-epoch pointings, with 
the relevant CCDs shown as individual rectangles. Lower panels: properties 
of the same set of CCDs, exhibiting significant variations and correlations. 
The nontrivial, spatially-varying geometrical overlap and image properties 
will result in spatially-varying systematics when analysing the galaxy cata¬ 
logues. 


in Sevilla et al. (201 1); Desai et al. (2012); Mohr et al. (2012)^. The 
operations performed in these codes unavoidably mix the image 
properties across the coadds and affect the properties of detected 
sources. The geometry of the DECam focal plane — a hexagonal 
shape, with 62 science CCDs (Flaugher et al. 2015; Honscheid et al. 
2008) — may also be imprinted in the reduced data. Therefore, one 
would like to access the full distribution of the single-epoch proper¬ 
ties, and connect it to the coadds, catalogues, and sky coordinates. 

To construct sky maps of the single-epoch properties, we pro¬ 
ceed as follows. We first connect the single-epochs and coadds, and 
keep track of which images were processed and coadded by the 
DESDM software. We then resolve the geometry of all images so 
that a given position on the sky is connected to a single coadd image 
and to a set of single-exposure CCDs. This is realised by access¬ 
ing the images individually and using the WCS^ transformations 
to convert local image coordinates into equatorial coordinates. We 
also make sure these transformations match the procedures used in 
the DES software^. Finally, we employ the HEALPix pixelisation 
(Gorski et al. 2005) and connect the tree of geometrically-resolved 
images to HEALPix pixels on the sky. 

The previous construction gives access to the full joint distri¬ 
bution of single-epoch and coadd image properties on the sky. This 

^ Including SCAMP (astrometry, Bertin 2006), SWARP (image coaddi¬ 
tion, Bertin et al. 2002), PSFEx (modelling of the point- spread-function, 
Bertin 2011) and SExtractor (object detection and measurement, Bertin & 
Arnouts 1996). 

^ WCS refers to the World Coordinate System of the FITS format (Cal- 
abretta & Greisen 2002). 

^ In particular, DES images make use of the WCS TPZ projection, built on 
the standard TAN projection and adding general polynomial corrections. 
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Figure 2. Projection of the properties of the single-epoch images of Fig¬ 
ure 1, showing how the time fluctuations and correlations are converted into 
spatial fluctuations. ADUs are Analog-Digital Units. 


is a complicated object since each HEALPix pixel contains a vec¬ 
tor of image properties. As mentioned before, a crucial product is 
the projection of this joint distribution into scalar sky maps. This 
requires the computation of one value (such as a summary statistic) 
per pixel, e.g., compressing the vector of seeing values in each pixel 
into mean, median, standard deviation, or even minimum and max¬ 
imum values. This process can be done for any quantity of interest, 
with arbitrary weights. This is how any potential source of spatial 
systematics arising from single-epoch images can be mapped onto 
the sky. Figure 2 shows the result of projecting some of the prop¬ 
erties of the images of Figure 1. We see that the geometry of the 
CCDs as well as the relative orientations of the focal planes for the 
various exposures strongly affect the coverage and mean properties 
of the survey. 


2.2 Application to DES SV data 

Science Verification (SV) data refers to the testing data acquired 
between November 2012 and February 2013, processed by the 
“SVAl” version of the DESDM pipeline (Yanny et al. 2015) and 
consisting of 858 coadd tiles, 665 of which have data in all five 
grizY bands. The SV data cover more than 300 deg^ in total, split 
into contiguous regions of interest: the large SPT-E and SPT-W re¬ 
gions 200 and 50 deg^, respectively), the RXC J2248, Bullet, 
and El Gordo known rich clusters 10 deg^ each), COSMOS 
6 deg^) and the Supernovae fields SN-E, SN-X, SN-S, SN-C 
10 deg^ each). The footprint of DES-SV is shown in Figure 3 
with the various fields labelled. 

To create maps for the full SV data set, we create a sym¬ 
bolic tree of CCD and coadd images, resolve their geometries, and 
project them into HEALPix maps. However, unlike the illustration 
shown in Figure 1 and Figure 2, we now crop the projection to each 
tile (the black contour of Figure 1) since the DESDM software sep¬ 
arately processes tiles using the stacks of CCD images. We perform 
this projection in the full SV area and stitch the projected coadds to 
assemble the full SV footprint. In terms of outputs, we project the 
following quantities in the five grizY bands: airmass, seeing, sky 
brightness, sky sigma (defined later in this paragraph), and expo¬ 
sure time. These can all affect the quality of the photometric mea- 
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Figure 3. The DBS SV footprint, partitioned into several discontinuous re¬ 
gions, the largest being the SPT-E and W helds 200 and 50 deg^, re¬ 
spectively). The small red regions contain objects where spectroscopic red- 
shifts are available, used to train the photometric redshift estimation codes, 
as discussed in Section 3. 
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surements (see e.g., Li et al. 2015). We compress the multi-epoch 
information into average and total maps {e.g., mean seeing and total 
exposure time). For the former, a natural choice would be to take 
the uniformly-weighted average in each HEALPix pixel. However, 
this choice is probably too simplistic, as in practice images are 
coadded using weights derived from the flux variance. More pre¬ 
cisely, the DESDM pipeline provides a ‘weight’ or ‘variance’ map 
for each single-epoch image. An additional quantity, coined ‘sky 
sigma’, characterises the variance of the flux in each pixel. Eor an 
image i and a given pixel, it is denoted by cr^, and depends on a 
number of parameters, including the flux itself, the gain of the am- 
plifler, the readout noise, the bias correction, and the flat-flelding. 
Single-epoch images are coadded using these variance maps such 
that the coadded flux is the weighted average over all exposures. 


Ftot — 
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in each coadd pixel, where Wi = . The extra piS are 

rescaling factors to enforce a common photometric calibration to 
the single-epoch fluxes. They read 


Pi = 100 


irnz-mzi)l5 
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where mzi is the zero point magnitude of the single-epochs and 
mz that of the coadd image. The variance of the total flux in each 
pixel of the coadd image is given by 


-1 


2 

^tot 
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(3) 


A detailed discussion of these quantities is beyond the scope of 
this paper, but we note that the total sky sigma is proportional 
to the magnitude limit of the survey. In the above formulae, we 
omitted the pixel indexing in cr^, but the coadding and the evalu¬ 
ation of (Jtot must be performed pixel by pixel across the coadd 
image. The technicalities of this process (including the projection 
and coadding) are handled by the SWARP software (Bertin et al. 
2002). Yet, the projection formalism presented above can be used 
to quickly estimate a^ot (and for example construct approximate 
magnitude limit maps). Eor this purpose we compute an average 
sky sigma per single-epoch CCD image, deflned as the pixel aver¬ 
age of ai across the CCD. Rather than computing ai and 
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Figure 4. Maps of some of the main observational quantities (potential 
sources of systematics) in the SPT-E and W fields (top and bottom of each 
sub-panel). The HEALPix maps are produced at Nside = 4096, where 
each pixel is the mean value of the observed ^side = 16384 sub-pixels, in 
order to obtain more accurate values near the edges of the survey. 
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Figure 5. Full-sky angular power spectra of some of the i band observa¬ 
tional systematics shown in Figure 4. Prior to power spectrum estimation, 
the maps were divided by their average values in order to obtain dimension¬ 
less C^s. The power spectrum of the DES-SV coverage mask (presented 
in Rykoff et al. 2015; Crocce et al. 2015) is also shown in black. Any ex¬ 
cess of power relative to the mask implies structure and features in the maps, 
which can yield non-trivial contamination and systematics in the galaxy cat¬ 
alogues. We also indicate the characteristic scales affected by the geometry 
of the SV survey and DECam instrument. 

pixel, we only need to calculate ai per CCD and crtot in the dis¬ 
tinct regions of image overlap, as shown in Figure 1. This yields a 
significant reduction of the complexity of the full projection, which 
needs to be performed for the five bands for a number of quan¬ 
tities of interest, using several hundred thousands of single-epoch 
images. Finally, any quantity of interest can be averaged using the 
same weights Wi (which we call ‘sky sigma weights’), which is 
more useful than the unweighted average. The effective seeing of 
the coadd images is better approximated by the sky sigma-weighted 
mean since the coadds are based on these weights. 

A number of maps were constructed for the DES-SV data, in 
order to capture the spatial fluctuations of the observing conditions 
and other observational quantities. They are used in numerous SV 
analyses to perform spatial null tests with the data (e.g., Vikram 
et al. 2015; Crocce et al. 2015; Jarvis et al. 2015; Becker et al. 
2015; Giannantonio et al. 2015). Figure 4 shows some of the main 
maps for the i band: the total exposure time, the mean sky sigma 
and total sky sigma, and the minimum, maximum and mean seeing. 
All quantities were calculated according to the previous scheme, 
i.e., the weighted average method and the sky sigma weights, with 
the exception of the mean sky sigma maps. This is because the 
weighted sky sigma is equivalent to the total sky sigma described 
above. Showing both maps sheds light on the difference between 
adding the noise properties linearly or in quadrature. In the fol¬ 
lowing we analyse these maps and detail the implications for the 
analyses of SV data. We focus on the SPT-E and W regions since 
they are the largest contiguous regions of the data. 

2.3 Analysis of the DES SV observing conditions 

The maps shown in Eigure 4 exhibit significant structure and fea¬ 
tures on all scales, mostly because DES data have three intrin¬ 
sic scales on which their properties can vary: the size of the DE¬ 
Cam focal plane (2.2 deg diameter field of view), the coadd tile 
(0.75 X 0.75 deg^), and the single CCD (0.3 x 0.15 deg^). In spite 
of the random offsets and overlap of the focal plane when obtain¬ 
ing images and coadding them, these three scales get imprinted in 
the projected observing conditions. For example, the focal plane 


geometry is clearly visible in the total sky sigma maps in a number 
of regions. This is due to a significantly lower or greater number of 
observations, or to their respective noise levels (sky sigma). Also, 
the mean seeing map is affected by outliers, i.e., by extreme (low 
or high) values of seeing in the set of single-epochs, as shown in 
the min/max maps in the bottom of Figure 4. The rectangular CCD 
geometry is also visible in the maps, especially near the edges. In 
addition, the observing properties of the 62 CCDs in a given single¬ 
epoch are very correlated since they experience quasi-identical ob¬ 
serving conditions. By contrast, correlations between exposures are 
due to proximity in time, for example if the observations were taken 
the same night. Finally, the tiles edges are particularly visible in 
truncated regions or due to applying different zero point magni¬ 
tudes (e.g., the centre of SPT-W, or the sharp transition in the upper 
part of SPT-E). 

To identify which scales may be affected by the features de¬ 
scribed above, we compute the full sky angular power spectra of 
the maps in Eigure 4 (the full SV, not only the SPT-E and W re¬ 
gions). The results are shown in Eigure 5; all spectra are made di¬ 
mensionless and normalised such that = 1 to clarify the 

comparison. As seen before, all maps exhibit significant power on 
all scales. The labels show which multipole ranges correspond to 
the typical scales of the SV fields, DECam focal plane, tiles, and 
CCDs. It is important to note that many of the features of Eigure 5 
are due to the sky coverage (i.e., the footprint) of SV, not the corre¬ 
lations in the observed regions. This is emphasised by an extra line 
showing the power spectrum of the DES-SV footprint mask. Here 
we do not deconvolve the effect of the mask on the power spectra 
because it typically redistributes the power between the i modes. 
In the pseudo-spectrum estimation method, this deconvolution as¬ 
sumes flat priors on the power spectra, while quadratic maximum 
likelihood estimators can incorporate more flexible priors on the 
power spectra (see e.g., Leistedt et al. 2013). This deconvolution 
would significantly affect the observed power spectra due to the 
small sky coverage of SV data. By contrast, not deconvolving the 
mask enables one to separate the scales affected by the survey cov¬ 
erage and by the observing conditions. The significant power in the 
^ G [0, 200] range is mostly due to the size and shape of the SV 
fields (all fields except SPT-E and W have approximately the size 
of the focal plane). In the other power spectra, any power in excess 
of the black line is due to structure within the fields, i.e., to the fea¬ 
tures described previously. As expected, airmass and seeing maps 
mostly have additional power on small scales. But the sky sigma 
maps have much more power on all scales, in particular around the 
focal plane and coadd scales. 

As seen in Figure 4, the maps of the observing conditions are 
correlated. Figure 6 shows the Pearson correlation coefficients of 
the DES-SV maps in the gri bands (calculated for the full SV area). 
These spatial correlations have two origins: the time correlations 
between observations made closely spaced in time, and physical 
correlations between some of the properties. For example, the noise 
level and seeing are correlated. 

In conclusion, the observing conditions fluctuate significantly 
on a wide range of scales, and may affect the properties of the 
galaxies detected in DES coadd images. Any resulting spurious 
spatial correlations that propagate into the galaxy catalogues will 
need to be detected and eliminated. Typical techniques to mitigate 
these effects in clustering analyses include modelling the survey 
window function (e.g., Maddox, Efstathiou & Sutherland 1996; 
Blake et al. 2010), or using cross-correlations (Scranton et al. 2002; 
Ross et al. 2011, 2012; Ho et al. 2012; Crocce et al. 2015) or 
mode-projection (Leistedt et al. 2013; Leistedt & Peiris 2014) to 
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Figure 6. Correlation coefficients between some of the maps produced for 
the DES-SV data. 
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Figure 7. Maps of the i band SV observing conditions incorporated in the 
BCC-UFig simulation, obtained by smoothing the maps of Figure 4 in tiles. 
The analysis mask shows the region considered when measuring the redshift 
distributions and galaxy number densities presented below. 




correct or mask the spatial modes affected by the observing condi¬ 
tions. Crucially, these approaches require the availability of accu¬ 
rate templates of the sources of systematics, which were precisely 
constructed in this section. We now turn to a concrete example of 
use of these templates. 


3 APPLICATION TO BCC-UFIG 

The BCC-UFig (Chang et al. 2014) is a framework of image-level 
simulations of the DES-SV data. It relies on the Ultra Fast Image 
Generator (UFig, Berge et al. 2013) and the BCC cosmological 
simulations (the Blind Cosmology Challenge, Busha et al. 2013) in 
order to obtain realistic images of a galaxy survey simulated with a 
known cosmological model. The BCC-UFig covers the SPT DES- 
SV region, and consists of 480 coadd images in the gnz-bands, 
and 432 in the F-band. As detailed in Chang et al. (2014), these 
images were processed using the same software packages as the 
DESDM SVAl pipeline. In this paper we exploit the fact that the 
simulated BCC-UFig images integrate some of the actual observing 
conditions of the DES-SV data. In particular, the simulated coadd 
images incorporate the median values of the seeing, limiting mag¬ 
nitude, and magnitude zeropoint of the true DES-SV images. These 
quantities were obtained from the products presented in the previ¬ 
ous section (i.e., maps of the median observing conditions, analo¬ 
gous to the mean maps shown in Figure 4), by averaging each map 
over the surface of the tiles. The result of this smoothing is shown 
in Figure 7, and we comment on its effect on the galaxy catalogues 
below. The fact that the BCC-UFig is based on simulated coadd im¬ 
ages and not on single-epochs is the main difference with the real 
SV data. However, as discussed below, BCC-UFig reproduces most 
of the spatial systematics found in the data and relevant to cluster¬ 
ing and weak lensing analyses, because these are due to fluctuations 
in observing conditions at scales larger than the coadds. 

Survey simulations like the BCC-UFig can be used to test 
analysis techniques and pipelines in the presence of realistic sys¬ 
tematics. For example, Chang et al. (2014) used the BCC-UFig 


to compare the performances of various star-galaxy classiflers and 
study the evolution of the observed galaxy and stellar densities as a 
function of some of the observing conditions (depth, seeing. Galac¬ 
tic latitude). Such tests cannot be performed at high signiflcance in 
the real data due to the small size and sky coverage of the sample of 
spectroscopically confirmed galaxies (based on the COSMOS and 
SN fields, shown in red in Figure 3). 

In this paper, we produce galaxy catalogues based on the 
BCC-UFig and compare them with the SV data catalogues. We 
mostly attempt to mimic the galaxy catalogues used in the clus¬ 
tering and cross-correlation analyses of SV Crocce et al. (2015); 
Giannantonio et al. (2015). We first construct a multi-band cata¬ 
logue by cross-matching the positions of the objects detected in the 
griz bands. We then remove all objects with extreme fluxes or col¬ 
ors: X > 30 and x — y > 3orx — y < —1, where x and y are 
mag_auto magnitudes in griz bands measured by SExtractor. To 
select galaxies in this catalogue, we use the ‘modest’ classifier. As 
described in Chang et al. (2014); Soumagnac et al. (2013), objects 
are labelled as galaxies by this classifier if they do not satisfy any 
of the following criteria: (mag_auto_i < 18 and class_star 
> 0.3) or (spreadmodelJ-f 3*spreadmodel_err J< 0.003) 
or (mag_auto_i < 21 andmag_psf _i > 30). Finally, we only con¬ 
sider objects with 18 < mag_auto_i < 22.5 in the SPTE region, 
and we split this galaxy sample into redshift bins using photometric 
redshifts. We now investigate the realism of these galaxy samples, 
first in terms of their redshift distributions. 


3.1 Photometric redshifts and redshift distributions 

Photometric redshifts (photo-z) — redshifts estimated from broad¬ 
band fluxes and colours — are one of the main sources of uncer¬ 
tainties in imaging surveys, and it is essential to reproduce this as¬ 
pect of the data with BCC-UFig galaxies. We employ three photo-z 
codes: BPZ (Bemtez 2000; Coe et al. 2006), TPZ (Carrasco Kind & 
Brunner 2013, 2014), and ANNz2 (Sadeh, Abdalla & Lahav 2015). 
These rely on very distinct algorithms that were tested on early SV 
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Figure 8. Redshift distributions of the SV data (left) and BCC-UFig (right) 
catalogues obtained using the photometric redshift estimation methods 
trained on a spectroscopic sample of galaxies (see text for details). They 
are normalised such that J N{z)dz = 1. By comparison, the variance ob¬ 
tained by randomly splitting the redshift samples is of the order of 0.01 in 
all redshift bins. 

data in Sanchez et al. (2014). They are also used in the main SV 
clustering and cosmic shear analyses (Giannantonio et al. 2015; 
Crocce et al. 2015; Jarvis et al. 2015; Becker et al. 2015). For de¬ 
tails on the three codes and a updated comparison using SV data, 
we refer the reader to Bonnett et al. (2015). Here we only provide a 
brief summary of the three algorithms and focus on comparing the 
redshift distributions inferred from the SV data and the BCC-UFig 
simulation. 

BPZ is a Bayesian template fitting photo-z code that relies on 
a set of calibrated template spectra, which are redshifted and con¬ 
verted into template colours using the DES filters. It computes a 
posterior probability for the redshift of each object given its ob¬ 
served colours and errors, by fitting for all templates and marginal¬ 
ising over the choice of template. By contrast, TPZ and ANNz2 
are machine learning codes that must be trained on a representa¬ 
tive sample of the data to infer a set of heuristic rules (Le., a flex¬ 
ible data-driven model) to compute the redshift from the observed 
photometric colours. TPZ is a publicly available code® based on 
prediction trees and random forests, while ANNz2 uses a combi- 

® http://lcdm.astro.illinois.edu/code/mlz.html 


Figure 9. Difference between the redshift distributions for mean seeing 
bins, i.e., computed for the good (low seeing) and bad (high seeing) regions. 

nation of machine learning algorithms, including neural networks 
and /c-nearest neighbours. The three photo-z codes deliver a red¬ 
shift probability distribution function (PDF) and a photo-z point 
estimate, usually measured as the mean or the mode of the PDF. 

We employ the BPZ, TPZ and ANNz2 algorithms that were 
trained and calibrated on the SV data, more specifically on the sam¬ 
ple of galaxies presented in Bonnett et al. (2015) for which spec¬ 
troscopic redshifts are also available (about 46,000 galaxies). This 
sample is shown in red in Figure 3 and was used to calibrate the 
BPZ template prior and train the TPZ and ANNz2 methods. Note 
that we only use mag_auto magnitudes and colours with BPZ and 
ANNz2, and we only include the magnitude errors in the training 
of TPZ (where they are used to perturb the magnitudes when re¬ 
training the prediction trees, in order to obtain reliable redshift pos¬ 
terior PDFs). 

Following most analyses of SV (e.g., Crocce et al. 2015; Gi¬ 
annantonio et al. 2015), we create five BCC-UFig redshift samples 
by selecting the objects with photometric redshift falling in a top 
hat window of size Az = 0.2 in the range 0.2 < z < 1.2. We 
use the ANNz2 photo-z point estimates to bin our data in the red¬ 
shift ranges, i.e., to select the objects that fall in each redshift bin. 
We then reconstruct the N(z) by stacking the redshift PDFs of the 
selected objects for the three codes. Figure 8 shows the redshift 
distributions of the SV data samples compared with their BCC- 
UFig counterparts. We recall that the BCC-UFig and SV data were 
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subject to the same colour and quality cuts, and restricted to the 
same portion of the sky: the SPT-E region analysis mask shown in 
Figure 7. Hence, the inferred redshift distributions should match 
relatively well since the colours of the BCC-UFig galaxies were 
shown to correctly match that of the data in Chang et al. (2014). 
This is confirmed by Figure 8: when comparing the left and right 
panels, the features and relative amplitudes between the N{z) in¬ 
ferred from the three codes are very similar. 

An important difference between the left and right panels is 
that the true redshift distributions can be calculated for the BCC- 
UFig and be compared with the distributions inferred using photo¬ 
metric redshifts. Analysing the detailed performance of the photo-z 
codes is beyond the scope of this paper; a full investigation in the 
context of the weak-lensing SV data samples is presented in Bon- 
nett et al. (2015). However, the results of Figure 8 show that most 
features of data redshift distributions are recovered in the simula¬ 
tion. For instance, BPZ yields wider iV(z)s than machine learning 
methods, but less accurate near z ~ 0.4 due to the layout of the 
DES grizY filters and the limitations of the set of template spectra. 
Also, the redshift distributions inferred by TPZ are narrower than 
the true underlying distribution. These features persist when select¬ 
ing galaxies with BPZ or TPZ photo-z point estimates. Selecting 
with ANNz2 minimises the width of the inferred N(z) from the 
three methods, and reduces the amount of low-redshift outliers in 
the third bin. 

The comparison of true and inferred redshift distributions is 
not trivial with the SV data given the small sample sizes of spectro¬ 
scopically confirmed galaxies, especially at high redshift. For this 
reason, a realistic survey simulation like BCC-UFig is a powerful 
tool for testing critical analysis stages such as photometric redshift 
estimation, in regimes that are difficult to explore with the data. 
More specifically. Figure 8 demonstrates that the features seen in 
the redshift distributions calculated for the SV data are compatible 
with and well-reproduced by the BCC-UFig simulation. 

We now challenge an assumption made above (and in current 
SV analyses): the fact that the redshift distributions can be spatially 
averaged over a large area without accounting for systematics (here 
the entire SPT-E region). While the analysis of SV data is restricted 
to the most uniform regions, as shown by the analysis mask in Fig¬ 
ure 7, these include unavoidable residual depth and quality fluctua¬ 
tions. The mean N(z) is the main quantity of interest for cosmolog¬ 
ical analyses. However, its variance due to statistical and system¬ 
atic uncertainties must be evaluated, in order to assess the robust¬ 
ness of theoretical clustering and gravitational lensing predictions 
using the N{z). As we will see below, the statistical fiuctuations 
are small, as expected from the area and number density of objects. 
However, it is essential to test for residual spatial systematics in the 
inferred redshift distributions. 

Here we focus on testing the variability of N(z) distributions 
due to residual depth fluctuations and observational systematics in 
the SPT-E region. For each of the quantities presented in the pre¬ 
vious section (e.g., seeing) we compute the median value and use 
it to split each redshift sample into two subsamples. These cover 
different regions of the sky, observed under different conditions. 
We compute the difference between the redshift distributions cor¬ 
responding to the two patches, i.e., taking the redshift distribution 
of the galaxies in the region where the systematic is above the me¬ 
dian value, and subtracting that of the galaxies in the other region. 
Figures 9 and 10 show these differences for the i band seeing and 
exposure time, respectively; these are two significant sources of 
spatially-varying depth in the SV data (e.g., Crocce et al. 2015). 
Importantly, the variance obtained by randomly splitting the red- 
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Figure 10. Same as Figure 9, but for exposure time. However, note that the 
fluctuations go in the opposite direction since we compute low minus high 
values, which corresponds to bad and good regions for exposure time. 

shift samples (instead of splitting based on observing conditions) is 
of the order of 0.01 in all redshift bins. 

These figures indicate that the N(z) differences are significant 
compared to the sample variance. This is expected since good re¬ 
gions (e.g., low seeing or high exposure time) have lower noise and 
better photometry. As a consequence the photo-z codes will have 
better overall quality and yield narrower redshift PDFs. Therefore, 
the derived redshift distributions when selecting objects in top hat 
redshift windows will be more accurate. In our difference conven¬ 
tion, this translates into a positive bump surrounded by wells in Fig¬ 
ure 9 (since we take the difference between low seeing minus high 
seeing regions), and the opposite in Figure 10 (where we compute 
low exposure time minus high exposure time). This is indeed ob¬ 
served in most bins, even though this depends on the details of the 
photometric redshift estimation. 

While the observed N(z) fiuctuations are significant com¬ 
pared to the sample variance, they are small compared to the overall 
amplitudes shown in Figure 8 (less than 5% in all the cases we 
tested). This can be seen not only in the histograms of the true 
redshifts from BCC-UFig, but also in the N(z) inferred from the 
photo-z codes. In fact, in the right panels corresponding to BCC- 
UFig, these distributions follow the fluctuations of the true red¬ 
shifts. This is not the case in all panels, because low and high red¬ 
shift objects suffer from other issues that make the comparison dif- 
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Figure 11. Changes in the galaxy number densities (rigais) relative to the mean (n) in each redshift bin, as a function of some observational properties, also 
normalised to their mean values. Both the SV data (black circles) and BCC-UFig (red squares) exhibit similar fluctuations, which are small but signiflcant 
compared to the sample variance, calculated using jack-knife re-sampling in 50 sky regions. 


ficult. In particular, we did not re-weight the redshift distributions 
to adjust the colour distributions of the training, validation and data 
samples, as in Bonnett et al. (2015); Sanchez et al. (2014). Such 
corrections do not affect the comparison between the data and the 
simulation. 

This analysis provides an estimate of the order of magnitude 
of the N{z) fluctuations due to observational sources of systemat¬ 
ics and spatially varying depth in the SV data. Provided the mean 
N{z) is properly characterised, these fluctuations will not bias the 
cosmological analyses. However, because they are due to residual 
spatial systematics, they may cause other types of contamination in 
the galaxy catalogues. This is shown in Crocce et al. (2015) and in 
the next section, where seeing is found to spuriously correlate with 
the SV data and contaminate the clustering measurements. 


3.2 Spatial null tests 

We now turn to the spatial properties of the BCC-UFig redshift 
samples. Figure 11 shows the average galaxy density measured in 
the previous redshift samples as a function of a few sources of 
systematics (median exposure time, seeing, and sky sigma). We 
create these data points by jointly analysing HEALPix maps (at 
Nside = 4096) of the galaxy redshift bins (SV data and BCC- 
UFig) and the maps of observing conditions presented in the pre¬ 
vious sections. Prior to estimation, all maps are divided by their 
mean values, so that the observables are dimensionless and con¬ 


centrated near the central values (1,1) in the panels of Figure 11. 
The dynamical range explored by the galaxy densities in each panel 
depends on the observational quantity under consideration. For ex¬ 
ample, normalised seeing values are mostly concentrated between 
0.9 and 1.1, while exposure times span a wider range, as can be 
verifled in Figure 4. The error bars are obtained by jack-knife re¬ 
sampling in 50 sky regions, which is possible thanks to the large 
number of objects (greater than 10^ in each region). 

Analogous galaxy density measurements are shown in 
Suchyta et al. (2015); Crocce et al. (2015) using the SV data. Fig¬ 
ure 11 shows very similar trends and amplitudes despite using dif¬ 
ferent maps (the median maps instead of the weighted mean maps). 
The most signiflcant fluctuations are due to the r and i band see¬ 
ing, particularly in the first and last redshift bins, in agreement with 
what is found in Suchyta et al. (2015); Crocce et al. (2015). Other 
observational properties create similar but smaller fluctuations. Re¬ 
markably, the BCC-UFig redshift samples exhibit similar galaxy 
density fluctuations in most bins. In particular. Figure 11 shows 
that the characteristic features of the seeing and sky sigma fluctua¬ 
tions as a function of galaxy density are reproduced by BCC-UFig. 
This demonstrates that the simulation succeeds in capturing some 
of the galaxy density fluctuations caused by the systematics con¬ 
sidered here. The remaining qualitative and quantitative discrepan¬ 
cies are likely due to the approximations adopted in the simulation. 
The most signiflcant effect is likely the incorporation of observing 
conditions at the tile level instead of the single-epoch images: the 
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current implementation limits the spatial resolution of systematics 
to relatively large scales. 

More generally, it is interesting to quantify the extent to which 
the maps capture depth fluctuations in the data. This is because the 
effects described above — spurious spatial variations in the red- 
shift distributions and galaxy densities — are usually corrected for 
or marginalised over in cosmological analyses. This is either done 
at the level of the survey window function or in the measured power 
spectra or correlation functions. We do not attempt to develop and 
validate such a model since this must be done in the context of 
a speciflc analysis at hand {e.g., clustering), which is beyond the 
scope of this paper. However, we demonstrate that the maps trace 
the main sources of systematics by showing that they strongly cor¬ 
relate with depth fluctuations and stellar contamination. Figure 12 
shows the Pearson correlation coefficient of some relevant observ¬ 
ing condition maps with (1) a map of the stars misclassifled as 
galaxies (by the ‘modest’ classifler) in the BCC-UFig galaxy sam¬ 
ple described above; (2) maps of the average i band magnitude er¬ 
rors in the BCC-UFig and SV data (‘Gold’ catalogue, see Crocce 
et al. 2015) in mag_auto_i magnitude bins. Figure 12 shows that 
the exposure time and total sky sigma maps strongly correlate with 
the magnitude errors in all bands and magnitude bins, in both the 
data and the simulation, demonstrating that the maps capture most 
of the depth fluctuations. In fact, a depth map of the SVAl ‘Gold’ 
catalogue was constructed using the method described in detail in 
Rykoff et al. (2015). Briefly, a coarse depth map is first constructed 
by fitting the magnitude-magnitude error relation of galaxies, ex¬ 
ploiting the fact that the magnitude errors satisfy am oc apjF 
where F and gf are the galaxy flux and its standard deviation. 
This relation depends on the local limiting magnitude of the survey, 
which can be estimated in coarse HEALPix pixels where there are 
enough galaxies to obtain precise limiting magnitude estimates (but 
at low spatial resolution). This map is then refined by constructing a 
data-driven model of the depth based on the maps of the observing 
conditions presented here, which are available at very high resolu¬ 
tion (using machine learning algorithms, see Rykoff et al. 2015). 
The maps were also used in Crocce et al. (2015) to build a linear 
model of the spurious correlations observed in the angular correla¬ 
tion functions, and correct for them. 

Note that the correlations between the noise and the magni¬ 
tude errors are less significant in the simulation than in the data. 
This is due to the approximation highlighted previously: BCC- 
UFig is based on simulated coadd images, not on simulated single¬ 
epochs. Hence, systematics at scales smaller than the coadds are 
not resolved. The previous section showed that this approximation 
yielded correctly reproduced systematics in the galaxy densities 
and redshift distributions, which are due to large-scale fluctuations 
of the observing conditions {e.g., seeing). However, fluctuations in 
the noise and depth can be significant on sub-coadd scales. This 
explains why the correlation coefficient between the map of total 
sky sigma and the magnitude errors in BCC-UFig is less signifi¬ 
cant than what found in the data. 

Finally, as shown in Figure 12, the observing condition maps 
correlate with the stellar contamination in the BCC-UFig samples. 
This test cannot be performed with the SV data since object types 
are only available for a small sample of spectroscopically con¬ 
firmed sources, restricted to a small region of the sky, as shown 
in the previous sections. A large, realistic simulation such as BCC- 
UFig allows us to confirm that the observing conditions trace the 
main sources of spatial systematics in the galaxy samples, and can 
be used to model and remove them in cosmological studies. 
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Figure 12. Pearson correlation coefficients between observing conditions, 
stars missclassified as galaxies in the BCC-UFig reduced data, and mean 
i band magnitude errors (magerr_auto_i) in both the BCC-UFig and 
SVAl-Gold galaxy catalogues (in magnitude bins, using mag_auto_i). 
This shows that the maps of the observing conditions are significantly cor¬ 
related with the stellar contamination and depth fluctuations in the SV data 
and simulations, therefore capturing the main sources of spatial systematics 
present in the galaxy samples. 


4 CONCLUSIONS AND OUTLOOK 

We detailed a method to extract and project the properties of multi¬ 
epoch galaxy surveys onto the sky, making use of the properties 
of the images and the HEALPix pixelisation. We applied this tech¬ 
nique to the DES SV data and mapped the main sources of ob¬ 
servational systematics, including the average properties of seeing, 
airmass, and sky sigma. These maps will be made publicly avail¬ 
able in the forthcoming DES data releases, and are currently used 
in analyses of SV data {e.g., Vikram et al. 2015; Crocce et al. 2015; 
Jarvis et al. 2015; Becker et al. 2015; Giannantonio et al. 2015). 

High-resolution maps of the observing conditions can be used 
as templates to identify, model, and mitigate spatial systematics or 
residual contamination in the data. As an illustration, we measured 
the galaxy densities and redshift distributions of DES SV tomo¬ 
graphic redshift galaxy samples, and showed that they were signif¬ 
icantly affected by the observational conditions of the survey due 
to residual depth and photometry fluctuations. These systematics 
are correctly mitigated in current SV analyses thanks to the sky 
masks and corrections to the two-point correlation measurements 
validated by stringent null-tests (see e.g., Giannantonio et al. 2015; 
Becker et al. 2015; Crocce et al. 2015). However, it will be increas¬ 
ingly difficult to keep them under control in future studies. Eor in¬ 
stance, restrictive sky masks can remove unreliable regions but of¬ 
ten discard hard-won data and do not alleviate the need to treat spa- 
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tial systematics in the retained regions. As the depth and sensitivity 
of the survey increase, these systematics will become increasingly 
significant compared to statistical errors. The power of masking or 
current correction techniques is also limited since they rely on tem¬ 
plates and contamination models which are not validated against 
simulations. 

One approach to resolve these issues, i.e., assess the signifi¬ 
cance of systematics and validate the techniques to mitigate them, 
is to resort to realistic image simulations. All the tests and analy¬ 
ses of this paper were performed in parallel on galaxy samples ob¬ 
tained by processing the BCC-UFig in the same way as the SV data 
and applying the same quality and selection cuts. These simulated 
galaxy samples include spatial systematics since the image simu¬ 
lations incorporate the actual SV observing conditions. Even with 
the approximation of simulating coadd images instead of single¬ 
epochs, we found that the principal effects of spatial systematics 
observed in the galaxy densities and redshift distributions were suc¬ 
cessfully reproduced by the BCC-UFig galaxy samples. Further¬ 
more, the data and the simulation agreed quantitatively in many 
cases, showing that the current BCC-UFig simulation, even with 
known limitations, is sufficiently realistic to study a range of ef¬ 
fects. The availability of the ground truth in the simulation (e.g., the 
true redshifts) allowed us to quantify the significance of the sys¬ 
tematic density and redshift fiuctuations for the first time, and to 
demonstrate that the observing condition maps capture systemat¬ 
ics such as depth fluctuations and stellar contamination. Pursuing 
this route will be essential for the future DES studies, since these 
fiuctuations will have to be carefully characterised and mitigated. 
Future versions of the BCC-UFig simulation will be more realis¬ 
tic and reproduce spatial systematics at higher resolution. Com¬ 
bining them with high-resolution maps of the observing conditions 
and the effective transfer function measured by Balrog (Suchyta 
et al. 2015) will allow us to fully exploit the potential of DES data 
for cosmological studies. These complementary avenues will be es¬ 
sential to correctly interpret the deep, high-cadence data delivered 
by the Large Synoptic Survey Telescope (LSST), where both the 
statistical power and the impact of the observing conditions will be 
increased by many orders of magnitude (e.g., LSST Science Col¬ 
laboration et al. 2009; LSST Dark Energy Science Collaboration 
2012; Jee & Tyson 2011; Carroll et al. 2014). 
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